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Abstract. When stereoscopic images are presented alternately to the two eyes, stereopsis occurs 
at F > 1 Hz full-cycle frequencies for very simple stimuli, and F > 3 Hz full-cycle frequencies for 
random-dot stereograms (eg Ludwig I, Pieper W, Lachnit H, 2007 "Temporal integration of monocular 
images separated in time: stereopsis, stereoacuity, and binocular luster" Perception & Psychophysics 
69 92-102). Using twenty different stereograms presented through liquid crystal shutters, we studied 
the transition to stereopsis with fifteen subjects. The onset of stereopsis was observed during 
a stepwise increase of the alternation frequency, and its disappearance was observed during a 
stepwise decrease in frequency. The lowest F values (around 2.5 Hz) were observed with stimuli 
involving two to four simple disjoint elements (circles, arcs, rectangles). Higher F values were needed 
for stimuli containing slanted elements or curved surfaces (about 1 Hz increment), overlapping 
elements at two different depths (about 2.5 Hz increment), or camouflaged overlapping surfaces 
(> 7 Hz increment). A textured cylindrical surface with a horizontal axis appeared easier to interpret 
(5.7 Hz) than a pair of slanted segments separated in depth but forming a cross in projection (8 Hz). 
Training effects were minimal, and F usually increased as disparities were reduced. The hierarchy of 
difficulties revealed in the study may shed light on various problems that the brain needs to solve 
during stereoscopic interpretation. During the construction of the three-dimensional percept, the 
loss of information due to natural decay of the stimuli traces must be compensated by refreshes 
of visual input. In the discussion an attempt is made to link our results with recent advances in the 
comprehension of visual scene memory. 

Keywords: stereopsis, temporal integration, slant, shear disparities, processing times, stereoscopic memory. 

1 Introduction 

1.1 Origin of the work 

In clinical ophthalmology a classical treatment for reeducating patients with a lazy (am- 
blyopic) eye uses pairs of images that can be fused. The idea is to present the images in 
alternation to the two eyes, with a specialized apparatus called a synoptophore. Typically at 
a 2 Hz alternation frequency each image is presented for 250 ms, and the patient with a lazy 
eye perceives two images in alternation, rather than a single stable or blinking image. The 
alternation frequency is then raised. Up to 4 Hz the patient usually perceives two alternating 
images. Above 4 Hz the lazy eye's image is perceived with a lag, and at around 6-7 Hz this 
image completely disappears (SR; data not shown). With training, the patient becomes able to 
perceive the two images at increasingly high alternation frequencies, until he or she hopefully 
succeeds in fusing the two images. 

One of us (SR) was initially interested in monitoring the progress made by strabismic 
patients in the course of orthoptic treatments, and added stereoscopic tests to the commonly 
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used fusion tests. It was soon revealed that patients who were able to interpret a certain 
stereoscopic pair in three dimensions at a certain alternation frequency on the synoptophore 
could require a significantly higher alternation frequency for another stereo pair, indepen- 
dently of the disparity ranges used in the stereograms. Differences were found between quite 
simple stereograms — for instance, between the first two and the last two stereograms in 
figure 1, which had been designed for strabismic patients with inconstant squint (Rychkova 
and Ninio 2009) . Subsequently, differences in alternation frequencies were also observed 
with subjects who had normal vision. Whereas earlier comparative studies on stereogram 
perception often focused on complex images that required large latencies in order to be 
perceived (see section 1.2 below), here our criterion seemed to be exquisitely sensitive to 
subtle differences between simple related stereograms. 

In this paper we report our observations on alternating frequency thresholds involving a 
set of twenty different stereograms, organized in five blocks of four images each. The first 
three blocks had already been used in our previous study, and were found to provide a nicely 
graded range of difficulties (Rychkova and Ninio 2009) . The stereograms that were perceived 
with the greatest difficulty under static viewing conditions in our earlier work turned out to 
be those that required the highest alternation frequencies in the present work. The last two 
blocks of images were included here to extend our exploration of stereograms based on very 
simple patterns. While it is tempting to advance, as a first approximation, that processing 
latencies in static displays measure the same properties as frequency thresholds in alternating 
presentations (see sections 1.2 and 1.3 below), this is an oversimplification (section 1.4 below). 
As will be stressed in section 4.3 of the discussion, our results on alternation frequency 
thresholds require a refining of the common conceptions of memory's role in stereoscopic 
processing. 

1 .2 Processing times in simultaneous presentations 

So far, most comparative studies on stereoscopic difficulties in stereograms have used static 
stimuli that required large latencies to be interpreted in depth. For instance, in their study 
of slant perception and the primitives of stereopsis, Gillam et al (1988) used random-dot 
stereograms (RDS) and found response times ranging from about 7 s for twists about a vertical 
or a horizontal axis to 35-38 s for whole-field slants or hinges around a vertical axis. In their 
comparative studies on linear textures, for stereograms representing five hemieilipsoids, 
Ninio and Herlin (1988) found response times in the 10 s to 30 s range, depending on 
the texture. Interpretation times of < 3 s are more commonly reported, even on the first 
presentation of a RDS to naive subjects (Bradshaw et al 1995). Learning effects are important 
(eg Frisby and Clatworthy 1975; Ramachandran 1976). While it took about 1.5 min for an 
average subject to interpret the famous 'hyperbolic paraboloid' of Julesz (1971) the first time 
it was seen, the perception time was reduced to just 1-2 s after six or seven exposures to the 
same pattern (Ramachandran 1976). 

Other experiments provide very different time scales for stereoscopic processing. Using 
an indirect argument, lulesz (1964) proposed a 50 ms processing time for an RDS. In more 
recent work, with dynamic RDS, values in the 130-290 ms range or the 50-80 ms range have 
been proposed, depending on the velocity of the moving patterns studied (Rosenzweig et al 
2002; Wolf et al 1996). It would seem that, when a simple surface (a rectangle) is encoded 
in an RDS and the subjects merely have to state whether the surface is above or below 
background, the response time can be 100 ms or less, while, when a more complex surface 
must be perceived, appreciably larger response times are found. We wish to stress here a 
common but perhaps frail observation — that stereoscopic processing would be slower when 
stereograms use larger disparities (eg Goriyu and Kikuchi 1971). 
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1 .3 Temporal processing in alternative presentations 

Stereopsis can occur even when the two images are presented with a delay between the two 
eyes (Exner 1875). More precisely Ogle (1963) found that exposure times of 18 ms could be 
separated by delay times of 100 ms. By using cyclic presentations (left eye exposure, delay 
right image exposure, delay) he was able to study the time constraints in a flexible manner. 
He found that, when exposing the subjects to these repeated cycles for a total of one second, 
the delay time could be increased by about 50 ms at constant exposure times. The subject, he 
reported, did not need more than four or five cycles to develop a depth interpretation at 100 
ms exposure times and 70 ms void interval. Thereafter, many studies using cyclic alternative 
presentations were carried out, focusing on the alternation frequency (the reciprocal of the 
total cycle duration) at which stereopsis occurred. For instance, Ludwig et al (2007) observed 
stereopsis at 1 Hz for stimuli composed of three vertical rods. With RDS they could not 
observe stereopsis below 3 Hz, while binocular luster could be observed only above 10 Hz. In 
agreement with Ogle (1963) or Herzau (1976), these authors found that smaller disparities 
require higher alternation frequencies. 

1 .4 Do latencies (in static presentations) and frequencies (in alternative presentations) 
measure the same properties? 

It seems obvious that stereograms presenting difficulties for stereoscopic processing should 
take a long time to be interpreted in three dimensions. However, there would also be cases 
in which processing takes a long time because the establishment of a complete accurate 
interpretation is tedious, rather than caused by a difficult stereoscopic problem. In the case of 
alternating presentations, one is tempted to assume that the higher the required alternation 
frequency, the more difficult the underlying stereoscopic processing problem. The distinction 
between very simple line stimuli that may require only 1 Hz and RDSs that require 3 Hz or 
more point in the direction of this assumption as do the detailed results presented in this 
paper. 

It is commonly argued (see Howard and Rogers 1997, page 186) that stereoscopic 
processing can occur under the alternating presentation modality because "signals from 
one eye persist long enough to interact with those from the other." Note, however, that 
the three-dimensional interpretation builds up over several successive cycles. Therefore, 
what is typical of stereograms that require high alternation frequencies is the need for 
frequent updating of two-dimensional information — a notion that does not coincide with 
the common interpretation of latencies for statically presented stereograms just stated. 
This consideration draws attention to the potential of the alternation frequency threshold 
technique for complementing other methods customarily used for studying stereoscopic 
processing. The goal of this paper is to demonstrate this potential in an exploratory study 
using a wide range of different stereograms. 

2 Materials and methods 

2.1 Subjects 

The twenty-seven subjects in our study (twelve males, fifteen females; average age = 27 years) 
had normal vision (1.0/1.0 acuities) in both eyes without correction (nineteen subjects) or 
after correction with refractive lenses (eight subjects). All had normal responses to standard 
clinical tests for binocular vision (Worth, Lang, and Fly tests). 

2.2 Apparatus 

A first part of the work, involving twelve subjects, was performed with a synoptophore, an 
apparatus classically used in clinical ophthalmology (see the supplementary material, part 
SI). We concentrate here on the data acquired on fifteen other subjects, using liquid crystal 
shutters designed by P Chaumont (Chaumont et al 1982), and produced in small series 
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in Russia. These shutters were designed for orthoptic work and are used for developing 
fusion in strabismic children (Rychkova et al 2001). A recent model was used, in which 
exposure times could be varied from 10 ms to 5 s in steps of 10 ms. The field of view is 
opalescent for the eye that does not receive an image. A normal cycle with these shutters 
involves alternative exposures to the two eyes, separated by binocular exposure intervals. 
In this work the binocular interval was set to its present minimum value of 10 ms. We also 
wrote an interactive graphics program for presenting our stereograms as anaglyphs on a 
computer screen, and running alternative presentation experiments. The source code and 
the instruction guide for this program are supplied in the supplementary material, parts S3 
and S4. 

2.3 Stimuli 

The stereograms used for the tests were designed to probe various aspects of stereoscopic 
vision. They were grouped into five blocks, each block containing four stereograms repre- 
senting different objects. The stereograms are shown in their standard version in figures 1-5. 
These were complemented by some variants with smaller or higher disparities. Furthermore, 
by exchanging the left and right images of each stereogram, variants with opposite depth 
signs were systematically studied. 

For the LCD experiments, each image was printed with a width of 2.5 cm then pasted 
on a plastic holder mounted on a graduated slide. The subject's eyes were 25 cm away from 
the stimuli. The subtended angle was about 5.7 deg. The separation between the centres 
of the two images was chosen between 58 mm and 63 mm to suit best the natural viewing 
mode of the subject at the chosen viewing distance (25 cm). The two images were physically 
separated by cardboard affixed perpendicularly to the plane of the images so that each eye 
viewed a single image. Maximum disparities across the range of stereograms were roughly 
comparable but this factor was not precisely controlled in this exploratory study. The reader 
can judge their relative sizes by fusing the stereograms shown in the various figures reported 
here. 

2.4 Procedures 

All subjects first practised with an easy stereogram representing a small central disk in front 
or behind a larger disk (Rychkova and Ninio 2009, figure 2e). The instructions to subjects are 
given below when we describe the typical sequence of percepts from alternating left-right 
stereo image presentations and the criterion that subjects were required to use in making 
their threshold judgments. They were then tested on the twenty stereograms in both their 
standard form and their inverted depth form. The presentation order was randomized as 
follows: the stereograms were presented block by block, and the order of presentation of 
the blocks was drawn at random. Within each block the order of presentation of the four 
stereograms in the block was drawn at random, and the two depth forms of each stereogram 
were presented in random order. The alternation frequencies were varied in ascending mode 
from 2 Hz to 20 Hz until stable stereopsis was reached, and in descending mode until the 
two images were seen monocularly in alternation. For each stimulus both the ascending and 
the descending modes were used, in random order. 

In both the synoptophore and the LCD work, each subject went through three complete 
series of tests, with different presentation orders, over three to six different days. At each 
frequency change there was a pause of a few seconds. To avoid visual fatigue the tests were 
divided into several 30-40 min sessions, interrupted by pauses of about 30 min to 60 min. 
In the case of the synoptophore studies, the tests included variants with lower or higher 
disparities of each of the four stereograms in blocks 1 and 3 (see the supplementary material). 
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2.5 Statistics 

For each stereogram in this study and each subject we take the threshold frequency averaged 
over the twelve tests (ascending or descending frequency mode, normal or inverted depth, 
each condition tested three times) as a single result. For each stereogram we then have fifteen 
threshold frequencies — that is, one for each subject. The means (m) and standard deviations 
(sd) within this set of fifteen values are shown in figures 1-5. Assuming normally distributed 
variables (which is not strictly the case), the 95% confidence interval (see, for instance, Dixon 
and Massey 1957) corresponds to the mean ± sd multiplied by 1.96 and divided by y/N, the 
square root of the number of measures. Here, N = 15 when the mean is taken over a single 
stereogram, or fc/Vwhen the mean is taken over k stereograms. For N = 15, the factor 1.96/ 
i/iV equals 0.506, and the corresponding factor for the 99% confidence interval (2.85/ y 7 AT) 
equals 0.666. We will therefore consider that a difference between two means taken over 
fifteen values is significant to the 95% level when it is larger than half the sum of the standard 
deviations. Similarly, the same difference is significant to the 99% level when it is larger than 
2/3 of the sum of the standard deviations. A referee has suggested that the design of our 
study makes it open to a repeated-measures statistical analysis as data were collected for all 
conditions from each subject. We prefer the approach we have just outlined which, we note, 
happens to be a more cautious approach by assuming that data collected across conditions 
from different subjects are independent rather than correlated in some ill-defined way. 

3 Results 

3.1 General observations 

When stereograms are presented at low alternation frequencies, the subject perceives the 
left and the right images in alternation. When the alternation frequency increases, there 
is a point at which the two images fuse, and this can happen in two ways. A subject may 
perceive a flickering image which seems intermediate between the left and right images or 
he or she may experience a form of unstable stereopsis: the stereogram is interpreted in 
three dimensions, but the relative locations of parts of the image are alternately those of 
the left image and those of the right image, giving an apparent rocking motion effect. As 
the alternation frequency is raised, the apparent motion stops and the subject perceives a 
three-dimensional image (flickering or stable) with stable geometry. The measurements we 
report here relate to this transition to or from stable stereopsis. Specifically, subjects were 
instructed to state when stable stereopsis became established, and the threshold was the 
frequency at which this judgment occurred. 

In these experiments, for each stereogram and each subject we collected twelve transition 
frequencies (ascending or descending frequency modes, standard or inverted disparities; 
three determinations for each of the four cases). The dispersion between the twelve values 
was rather small: training effects were minimal, and threshold frequencies slightly increased 
as disparities were reduced (for details, see the supplementary material, part S2). We will 
therefore discuss the averages of the twelve frequencies, first block by block (section 3.2), 
then in a more synthetic fashion (section 3.3). 

In response to the comments of a referee, we wish to make clear that our primary 
focus in the discussion that follows is to argue that the results reveal the potential of 
the alternating frequency threshold technique for exploring what we will call for short 
'stereoscopic difficulty'. Our stimuli varied on a number of dimensions — the nature of the 
three-dimensional surface portrayed, the complexity of the texture carrying disparity cues, 
the presence or absence of monocular regions, etc. Hence we do not try to draw definitive 
conclusions from our study about what stimulus factors may or may not have caused the 
observed variations in alternation frequency thresholds for our various stimulus conditions. 
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Rather, we use the results discursively to debate possible factors that may have been picked 
up by the alternating frequency threshold technique. We hope others will use our study 
as a starting point for their own use of this technique, using precisely controlled stimulus 
variations to suit their own experimental objectives. 

A1 A2 




frontoparallel 
vertical bar 



frontoparallel 
horizontal bar 



m 2.6 Hz / sd 0.26 Hz 



m 2.7 Hz / sd 0.26 Hz 



A3 



A4 





slanted 
vertical bar 



slanted 
horizontal bar 



m 3.6 Hz / sd 0.90 Hz 



m3.8 Hz /sd 1.27 Hz 



Figure 1. The four stereograms of block A represent a frontoparallel or a slanted rectangle enclosed by 
two arcs defining a zero disparity background. Each stereogram was presented as shown here, as well 
as in the inverted depth variant, in which the left and the right images were switched. The average 
threshold frequencies and standard deviations for the transition to stable stereopsis are given in Hz 
below each figure. 



3.2 Results per block 

Block A. These stereograms (figure 1) were designed to be particularly easy to interpret. Each 
pair contains three conspicuous shapes that are planar in three dimension: two thick arcs 
that form a zero disparity reference frame, and one rectangle. If stereoscopic processing 
needs only to assign a depth to the four apexes of the rectangles, then all the stereograms 
in this block should be of nearly equal difficulty. The 'vertical' rectangle in stereogram A3 
is in fact slanted about a horizontal axis: this can be deduced in monocular vision from the 
different orientations of the rectangles in the left and right images. The horizontal rectangle 
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in stereogram A4 is in fact slanted about a vertical axis, and this can be deduced in monocular 
vision from the length difference between the rectangles in the left and the right images. In 
stereograms Al and A2 the rectangles are frontoparallel: these stereograms can be taken as 
controls for A3 and A4. 

The four stereograms were used in our previous study on paradoxical fusion with a 
single eye (Rychkova and Ninio 2009). In this case there were clear-cut results. Most subjects 
(twenty six to twenty eight out of thirty) could detect depth in the case of the frontoparallel 
rectangles. About half (fourteen to seventeen out of thirty) could detect depth in the case of 
the slanted rectangles, but the slanted aspect was reported by only seven subjects and, even 
with these seven, not systematically. 

Here, with our subjects (who all had normal vision), we observe a difference in average 
threshold frequencies between the frontoparallel rectangles (2.6 Hz and 2.7 Hz) and the 
slanted ones (3.6 Hz and 3.8 Hz). Before reaching the frequency at which they perceived the 
slant in A3 and A4, a few subjects perceived these rectangles as frontoparallel. The difference 
in threshold frequencies between the frontoparallel rectangles (Al and A2) and the slanted 
ones (A3 and A4) is significant at the 99% confidence level. 

Using complex stereograms of the random-dot kind, Gillam et al (1988) found a very 
significant difference between the latency for a hinge about a horizontal axis (13.8 s) and that 
for a hinge around a vertical axis (37.7 s). We find a very slight effect in the same direction 
with our simplified stimuli, but in our case it is not statistically significant. 

Block B. The four stereograms in this block (see figure 2) use textured grids that form 
curved surfaces in three dimensions. Such stereograms were designed to provide a flexible 
alternative for RDS (Ninio 1981, 2007). They contain contours at all orientations, a feature 
that may activate a stereoscopic pathway mediated by the oriented edge detectors in VI and 
V2. These distorted grid stereograms are usually easier to see than RDS. However, they are 
not suited to represent surfaces with depth discontinuities (by which we mean step changes 
in depth), and they carry compression cues that can be detected under careful examination. 
Stereogram Bl represents a hemisphere, and in monocular vision a reasonable guess can be 
made about its three-dimensional shape because the circular contour matches the rim of 
the hemisphere. Stereogram B2 also represents a hemisphere but with a convexity reversal 
added in the centre. A subject who perceives the global spherical shape may still fail to detect 
the slight depression at the centre. B2 therefore requires more precision in the subject's 
stereoscopic reconstruction than Bl. Stereogram B3 represents a cylinder with a vertical axis, 
flanked with zero disparity extensions on both sides. These extensions were incorporated 
to create clear disparity signals at the edges of the cylinder. Stereogram B4 represents a 
cylinder with a horizontal axis. The cylinder's contours may provide clues to its shape, and 
the stereogram should be easy to interpret in depth, due to its simple shear disparity pattern. 

The four stereograms in this series were used in our previous work (Rychkova and Ninio 
2009). All patients could see a convex or concave shape in the Bl hemisphere, but only half of 
them (thirteen to fifteen out of twenty four) could detect convexity or concavity in the three 
other stimuli. 

In the present work, like in our previous work, Bl emerges as the easiest stimulus in the 
block (3.3 Hz average threshold frequency). The other stimuli are well separated. Stimulus 
B2, which requires precise stereopsis, needed an 8.4 Hz threshold frequency. The vertical 
cylinder in B3 needed 6.7 Hz, and the horizontal cylinder in B4 needed 5.7 Hz. The difference 
in threshold frequencies between B3 and B4 is not significant at the 95% confidence level; all 
other pairwise comparisons are significant at the 99% confidence level. 
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There is no new theoretical observation to derive from this set, but there is a practical 
one: the distorted grid textures appear suitable for generating significant differences in 
threshold frequencies between stimuli that represent different shapes. 



B1 



B2 





hemisphere 



m 3.3 Hz / sd 0.48 Hz 



dome with central 
depression 

m 8.4 Hz/sd 2.18 Hz 



B3 



B4 





vertical 
cylinder 

m6.7 Hz/sd 1.58 Hz 



horizontal 
cylinder 

m 5.7 Hz/sd 1.26 Hz 



Figure 2. The four stereograms of block B represent continuous surfaces defined by a distorted grid 
texture. As in figure 1, threshold frequencies and standard deviations are shown. 



Block C. Here, we have a more common class of stereograms (figure 3). C3 and C4 are 
Julesz-type stereograms — also commonly called RDS made by matrices of black or white 
squares drawn at random. Disparity takes discrete values determined by the side of an 
elementary square. Such stereograms are suitable for representing frontoparallel surfaces 
above or below background. With respect to monocular inspection, camouflaging is perfect 
(unless, due to unskilful programming, some monocular regions are revealed by internal 
repeats), but the shapes can easily be revealed using a physical procedure (representing 
the images on transparent sheets, and sliding one sheet horizontally over the other). These 
stereograms essentially test the capacity of stereopsis to defeat camouflage. C4 represents a 
horizontal and C3 a vertical rectangle, above or below background. Stereograms CI and C2, 
in which horizontal or vertical rectangles are represented explicitly by turning the black cells 
into grey cells, serve as controls for C3 and C4. 
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C1 C2 




explicit vertical explicit horizontal 

rectangle rectangle 

m 5.5 Hz / sd 1 .0 Hz m 4.9 Hz / sd 0.92 Hz 



C3 




C4 




hidden vertical 
rectangle 

m12.4 Hz/sd 4.0 Hz 



hidden horizontal 
rectangle 

m10.1 Hz/sd 2.10 Hz 



Figure 3. The four stereograms of block C represent a frontoparallel rectangle above or below 
background, using Julesz-type random square textures. C3 and C4 are camouflaged, using one of 
Juslez's RDS styles. The rectangles are made explicit in CI and C2. 



The four stereograms of this block were also used in our previous study, where they 
gave clear-cut results: only two out of twenty- four subjects could perceive depth with the 
camouflaged stereograms C3 and C4, while twenty-three out of twenty- four subjects could 
see depth with the explicit stereograms CI and C2. In the present study our subjects with 
normal vision passed successfully the Lang's RDS test (section 2.1), yet their threshold 
frequencies were quite high with the camouflaged stereograms C3 and C4 (12.4 Hz and 10.1 
Hz, respectively). The threshold frequencies were lower with the explicit stereograms CI and 
C2 (5.6 Hz and 4.9 Hz, respectively). The difference in measured threshold for stereopsis 
between the horizontal and the vertical camouflaged rectangles is not significant at the 95% 
confidence level. The difference between the camouflaged and the explicit rectangles is 
significant at the 99% confidence level. 

Block D. In blocks A, B, and C the stereograms represented different shapes, but there 
was some uniformity in visual appearance. In block D a set of five points occupy exactly 
the same positions in three-dimensional space in all four stereograms (figure 4). There is a 
central point — actually, a small black disk, centrally located — that can be used as a reference, 
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D1 D2 



• o 



o • 



4 apexes "vertical sides" 

of a tetrahedron of a tetrahedron 

m 2.5 Hz / sd 0.26 Hz m 2.8 Hz / sd 0.45 Hz 



D3 



D4 





"horizontal sides" 
of a tetrahedron 



tetrahedron 



m2.8 Hz/sd 0.47 Hz 



m 2.6 Hz / sd 0.26 Hz 



Figure 4. The four stereograms of block D contain four peripheral elements at the apexes of a 
tetrahedron, and a central reference element. The stereograms differ in the way the peripheral elements 
are connected. 

and four circles at the apexes of a tetrahedron. The stereograms differ only in how the four 
apexes are connected. In Dl the apexes are unconnected, in D2 they are connected 2 by 2 
with horizontal segments, in D3 they are connected 2 by 2 with vertical segments, and in D4 
they are connected with both horizontal and vertical segments. This is similar in spirit to 
an earlier work in which textures were constructed by identical sets of points connected in 
different ways (Herbomel and Ninio 1993) . In the present work all four stereograms turned 
out to be very easy to interpret (from 2.5-2.6 Hz for Dl and D4 to 2.8 Hz for D2 and D3). 

Block E. The four stereograms in this series explore a blind spot of the current 
stereoscopic doctrine. The important stereogram is E3; the others serve as controls (figure 5). 
Stereogram E3 represents two slanted needles that cross in projection but do not touch in 
three dimensions. Upon monocular inspection, the fact that the needles do not touch can 
be deduced from the fact the crossing points on the left and on the right are at different 
horizontal levels, implying that these points should not be matched in the three-dimensional 
interpretation. The fact that the needles composing the apparent crosses are slanted is 
revealed, under monocular inspection, by their orientation disparity. According to the current 
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E1 



E2 



\s V V V 



frontoparallel 
disjoint needles 

m6.7 Hz/sd1.14Hz 



slanted cross 



m7.0 Hz /sd 1.34 Hz 



E3 



E4 



XX 



disjoint 
slanted needles 

m8.0 Hz /sd 1.79 Hz 



interrupted 
version of E3 

m 2.9 Hz / sd 0.81 Hz 



Figure 5. Stereograms El - E3 of block E contain crosses in two dimensions that define either a slanted 
cross or a pair of needles at different depths. E4 is an interrupted version of E3. 

physiological doctrine about edge detection by neurons in V2, isolated needles should be 
appropriate stimuli for these neurons, but crosses should not. However, how V2 neurons 
should respond to needles that exist as needles only in the three-dimensional interpretation, 
but not in the two-dimensional stimuli, is not known. In E4 the central part of the E3 crosses 
were erased so that the images in E4 contain four smaller needles that do not intersect in two 
dimensions. In E2 there is a true slanted cross. In El, like in E3, the segments that cross in 
two dimensions do not touch in three dimensions, but, while E3's needles are slanted, El's 
needles are frontoparallel. The four stereograms have quite different thresholds for stereopsis. 
While a high 8 Hz frequency is needed for E3, 2.9 Hz are enough for the interrupted E4 variant. 
The real slanted cross (E2) needs a slightly higher frequency (7.0 Hz) than the dissociated 
frontoparallel needles El (6.7 Hz). Comparing E3 with El and E2, there is a penalty for slant, 
which affects E3 and E2 but not El, and a penalty for separation in three dimensions which 
affects E3 and El but not E2. Both penalties would be at play in E3, making E3 one of the 
most difficult stereograms in the whole work in spite of its extreme simplicity in the two- 
dimensional projections. Statistically, the pairwise differences between El, E2, and E3 are 
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not significant at the 95% confidence level, but all differences with E4 are significant at the 
99% confidence level. 

3.3 Across-block comparisons 

The results for all five blocks are recapitulated in figure 6. The stereograms can be organized 
into six classes according to their associated threshold frequencies (figure 3). 

Class 1 contains the apparently easiest stereograms, with associated frequencies below 3 
Hz. These stereograms represent explicit frontoparallel bars (Al, A2), a tetrahedron defined 
by four isolated or connected apexes (D1-D4) or four small isolated slanted needles (E4). 

Class 2 contains three stereograms with associated frequencies between 3 Hz and 4 Hz. 
Stereograms A3 andA4 are slanted versions of Al andA2. Stereogram Bl represents a textured 
curved surface. The matched circular contours give a strong cue about the three-dimensional 
shape. 

Class 3 contains two stereograms that represent rectangles (CI and C2), but this time 
three-dimensional interpretation requires handling monocular regions, either by extracting 
them or by adjoining them to the sides of the rectangles [as documented for instance by 
Collett (1985); recent review in Harris and Wilcox (2009) ] . This factor may have contributed 
to the associated threshold frequencies being as high as around 5 Hz. The camouflaged 
stereogram B4 falls in the same frequency range. 

In class 4 (6-7 Hz threshold frequency) we find two new types of difficulties. In 
stereogram El we have two frontoparallel needles that need to be separated in depth. 
The problem differs from that of extracting monocular regions, and may relate more to 
transparency issues. In stereogram B3, which represents a textured cylinder with a vertical 
axis, the disparity field repeats itself all along the vertical axis (we call this feature 'vertical 
degeneracy'), so there are no shear disparities. 

In class 5 (7-9 Hz threshold frequencies), the case of stereogram E3, representing two 
slanted needles forming an apparent cross, but separated in depth, is straightforward. The 1.3 
Hz increase in frequency with respect to the unslanted counterpart El is in the same range 
as that observed between slanted and unslanted rectangles (A3 and A4 versus Al and A2; 
average difference 0.9 Hz). Stereogram B2, whose 'dimple in the sphere' requires a detailed 
perception of shape, also falls into this frequency range. 

Finally, the two stereograms C3 and C4 with average threshold frequencies of > 9 Hz take 
their place in class 6. These are RDS in the style of Julesz (1971), in which the extraction of 
monocular regions and the camouflaging difficulties are combined. 

The only stimulus that does not find a natural place in this classification is the slanted 
cross E2. Its threshold frequency (7 Hz) is surprisingly high. 

Statistically, the differences between threshold frequencies are significant at the 95% 
confidence level for the class 4 versus class 3 or class 5 comparisons, and at the 99% 
confidence level for all other comparisons. 

In drawing up the summary table in figure 6, it is useful to restate here the caveat 
made at the outset that we have used a wide variety of stereograms that differ on several 
dimensions simultaneously. Hence we are not proposing firm conclusions and instead offer 
the comments above as simply suggestive and, we hope, of interest to those considering the 
multifaceted concept of 'stereoscopic difficulty'. 

While the frequency ranges varied considerably from subject to subject (supplementary 
material, figure SI), the ranking of their responses into classes 1-6 showed little variability. 
Indeed, the order was strictly preserved in eleven out of the fifteen LCD subjects, and 
preserved with a single misplacement in the remaining four. 
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Figure 6. Shifts in threshold frequencies. Starting from a core of 'basic' stereograms, the threshold 
frequencies are shown to increase when the following features are introduced: slant, curvature, 
extraction of monocular regions, depth separation, camouflage, absence of shear disparities. The 
stimuli identification numbers in brackets are those of figures 1-5. The only stimulus that does not 
find its place naturally in this classification is the slanted cross E2. Its associated threshold frequency 
(7 Hz) is surprisingly high. Average threshold frequencies and standard deviations are given for each 
class. 



3.4 Miscellaneous observations 

A few subjects made fine, correct responses at a certain frequency, and coarser responses at 
a slightly lower frequency. This was true for stereogram B2's double curvature, coarsely seen 
as a dome, or for the cylinders B3 and B4, coarsely seen as convex or concave shapes without 
elongation axes. Interestingly, the slanted bars A2 and A3 could be seen at low frequency as 
frontoparallel bars. There were also cases of inverted relief for some stimuli at low frequency 
and also cases of alternation between normal and inverted depth, as though the left and 
right images were exchanged at each half cycle. 

A classical theme in stereo vision is depth perception in unfused images (eg Ogle 1953; 
Ziegler and Hess 1997; recent review in Wilcox and Allison 2009). Here, we have a related but 
different phenomenology. For most subjects there is a stage at which depth is perceived in 
a fused image, but where the image oscillates laterally. This motion can be interpreted in 
at least two ways: (i) depth is assigned alternately to the left and the right image, and the 
succession of still representations produces an apparent motion effect, and (ii) there is a 
single three-dimensional representation, but it receives alternate leftward and rightward 
leaning corrections in response to the alternating monocular inputs. 



4 Discussion 

4.1 Introductory comments 

Our initial purpose, in the work presented here, was to explore the potential of the alternating 
frequency technique in clarifying the nature of the computational problems which the brain 
has to solve during stereoscopic processing. As a tool, alternating presentations clearly 
succeeded in producing significant differences in alternation frequency thresholds for 
various classes of stereoscopic stimuli (see section 4.2 below). 

Most of the earlier studies on stereopsis with alternative presentation dealt with 
stereomotion effects — in particular, the Pulfrich illusion — or the relationship between 
stereomotion and static stereopsis (eg Morgan and Fahle 2000; Read and Cumming 2005). By 
their nature, these studies probed phenomena that occur within a few milliseconds. In our 
studies, when a stimulus is interpreted in depth at 2.5 Hz, each monocular phase lasts 200 
ms, so we are in quite a different time range. 
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We emphasize that why different stimuli require different alternation frequencies cannot 
be deduced from our results, and does not seem to be easily accounted for by current 
concepts. We believe that our results become more intuitive when a stereoscopic memory 
store is incorporated in models of stereoscopic processing (section 4.3). Finally we argue 
what seems to be an inescapable conclusion from our discussion — that stereopsis involves 
several types of qualitatively different computational tasks. 

4.2 A tour of stereoscopic difficulties 

In the hierarchy of difficulties which emerges from our work, the easiest stimuli are those 
in which a small number of elements must be positioned in depth, and where there is 
no calculation of slant, curvature, or shape, no monocular regions to deal with, nor any 
overlapping regions to dissociate. The only apparent exception to this rule is stereogram E4 
with small slanted needles. We propose, in line with previous work (eg Ninio 1985), that the 
brain has two ways to appreciate the slant of a small segment. One pathway, indirect but fast, 
would use the different positions in space of the segments' endpoints. A second pathway to 
slant could use orientation disparities (Herbomel and Ninio 1993), which is more direct but 
slower. We conjecture that the endpoint pathway dominates when the segments are short, as 
in the case of stereogram E4. This is corroborated by our informal discussions with subjects, 
indicating that many of them focus on the central part of the stereogram, assign depth to the 
four internal endpoints, then build their interpretation outwards. Contradistinctively, with 
longer segments, orientation disparity may dominate, thus slowing down interpretation as 
seen for instance with stereograms A3, A4, and E3. Note that, whichever pathway serves to 
compute a surface's slant, large surfaces take a long time to settle in depth with their finalized 
slant (van Ee and Erkelens 1996, 1999). 

Our measurements add little to the debate on the relative speeds of slant versus curvature 
calculations (see Devisme et al 2008; Rogers and Cagenello 1989). Our easiest stereogram 
with curvature in this study (Bl) is interpreted just as fast as the easiest stimuli with slant (A3 
andA4). 

Depth discontinuities (defined here as step changes in depth) add difficulty (4.9-5.6 Hz 
required for stereograms CI and C2, against 2.6-2.7 Hz for Al and A2). However, stereograms 
such as Bl and B2 that were designed to represent continuous surfaces may present gaps 
in their textures that, according to one reviewer, should be regarded as monocular regions. 
There have been many studies on how the monocular regions are perceived (recent review in 
Harris and Wilcox 2009). Here, we did not investigate the status of these regions in the C1-C4 
stereograms. The monocular regions may facilitate or slow down stereoscopic interpretations 
depending on their texture (Gillam and Borsting 1988; Grove and Ono 1999; Grove et al 2002). 
Whatever extra processing time may be needed to deal with surface discontinuities, it seems 
to be less than the extra processing time required for the separation in three dimensions of 
two needles forming an apparent cross. 

The fact that the textured cylinder with a vertical axis (B3) did not take significantly more 
time to interpret than the textured cylinder with a horizontal axis (B4) deserves discussion. 
In the stimulus with a vertical axis, the disparities are constant along any vertical line: there 
are no shear disparities, and this is known to be detrimental to stereoscopic interpretation 
(Gillam et al 1988). The 1 Hz difference we report between stereograms B3 and B4 is therefore 
surprising. However, we note that two subjects interpreted stereogram B4 in depth on the 
synotophore, but could not interpret B3, and one of the authors ( JN) recently developed a 
complete stereo blindness towards stimulus B3. This could be an age-related pathology (B 
Gillam, personal communication). 

Beyond these points of contact with current issues in stereoscopic perception, we believe 
that the alternation frequency methodology has the potential to provide a fresh overview 
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on stereoscopic processing. Our set of twenty stereograms generates a nicely spaced scale 
of threshold frequencies, from 2.5 Hz to 12.4 Hz. The technique may not be suitable with 
more complex stereograms, because such stereograms would require alternation frequencies 
above the 20 Hz level, and would soon enter a range in which rapid alternations have the 
same effect, due to visual persistence, as entirely static displays. In possession of a scale such 
as that in figure 6, we are in a better position to explore further issues in stereo vision, using 
very simple stereograms, thus with better signal to noise ratios than with highly complex 
RDS. For instance, the technique could be used with stereograms that involve subjective 
surfaces, illusory disparities, or isoluminant textures. If studies on new stimuli incorporate a 
selection of our stimuli, one will be able to rank all the results on a single scale. This might 
bring an element of systematic order in the field — perhaps the foundation for a modest 
analogue to Mendeleev's table in chemistry. 

4.3 Stereoscopic memory 

When a static stereogram is interpreted in three dimensions, "once the figure has been 
seen, wide conjugate eye movements may be made without 'losing' the global percept" 
(Ramachandran 1976, page 383). The same is true of autostereograms: "once stereopsis 
is achieved, the observer is free to inspect the entire field of the autostereogram without 
losing the depth percept" (Tyler 1983, pages 242-243). This stability of the three-dimensional 
interpretation implies some form of stereoscopic memory. 

Under the alternating presentation conditions, the brain would receive an essentially 
complete information package from the left eye during a 'left eye phase'. This information 
would be sustained during, say 100 ms after the end of the left eye phase. The information 
would progressively deteriorate, as every memory trace does, while the brain receives the 
sensory input from the right eye during the right eye phase. 

During the right eye phase the brain would compare the high-quality information 
received from the right eye with the sustained information from the left eye. Drawing from 
work on visual memory, this sustained information may exist at several qualitatively different 
levels of accuracy, explaining a good deal of the phenomenology. For a very short period, 
and provided that there is no distracting stimulus, the information could live at a high- 
quality level, as in iconic memory (Sperling 1960). After a while, the information would be 
maintained in abridged form, in a short-term visual memory (STVM) store (Phillips 1974). A 
new stimulus would push the content of STVM into a still lower quality store [a "long term 
memory" store according to Phillips and Christie (1976) — a "visual working memory store" 
according to more recent work] . 

Under the single exposure to each eye condition, the disparity threshold for stereopsis 
increases with the length of the void interocular interval (eg Ogle 1963). Under the alternative 
presentation conditions, the disparity thresholds also increase with the void interval duration 
(eg Ludwig et al 2007; Ross and Hogben 1975; Wist and Gogel 1966). Both observations are 
consistent with the notion of a progressive decay in quality of the information in the sustained 
format. 

Recent studies on visual memory in scene perception provide further ideas for the 
interpretation of our results. In scene perception, the eyes move and grab local information 
in sensory mode — information that is incorporated into a coherent global representation 
of the scene, built from the assembly of memory traces from previous captures of visual 
information (eg review in Hollingworth 2008). There is now some insistence on the idea that 
the global 'coherent' representation is much less volatile than the data from which it was 
constructed (see eg Rensink 2000). Furthermore, a current representation may or may not 
be updated, depending on a number of factors [see the reviews on 'change blindness', eg 
Simons and Rensink (2005)]. 
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In the alternate presentation tests, part of the information can be easily grabbed, then 
maintained in nonsensory mode. For instance, the wide rectangle and the two wide arcs of 
stereogram A2 can be encoded with approximate position information. The information 
about one image may be used to focus the matching tasks with the incoming sensory 
representation of the other image. On a first cycle, matching would be roughly achieved, 
but not with enough precision to determine the disparity signs. However, the other image 
may now be encoded in sensory mode, with improved positional information. If a slanted 
segment is represented, an approximate matching may assign a disparity sign at one end 
of the segment, but not be precise enough to assign a disparity sign at the other end of the 
segment, so the segment will be perceived as frontoparallel, in some cases protruding and in 
others recessing. This is a very common 'pathological' perception, which disappears when 
the alternation frequency is raised (see section 3.4). 

Other features in the alternative presentation phenomenology may have to do not with 
processing time, but with the logics of representation 'updating'. Thus, the intermediate 
frequency stage in which the stereogram is interpreted in depth but swings as a tumbler 
seems to imply that there is a stable three-dimensional representation, in which the three- 
dimensional calculations are not updated, but that the brain nevertheless signals the 
alternating character of the visual input. 

We might have in our experiments various regimes of stereoscopic processing, and we 
have initiated work to segregate the regimes with greater precision. In his initial studies, 
Ogle (1963) presented left and right images in alternation, and inserted between the two 
presentation intervals either a void interval without stimulus or a short binocular interval 
of simultaneous presentation to the left and right eye of their corresponding images. The 
binocular interval had minimal influence on the results, and we confirmed this feature 
using presentations on a computer screen, with six subjects and twenty stimuli (data not 
shown). However, we also found that, when the binocular interval was increased beyond 
20 ms, a striking phenomenon occurred: extending the binocular intervals by 20 ms could 
allow each monocular presentation of the left or the right image to be extended by 40 ms 
or even longer durations (preliminary results in Rychkova et al 2010). It is thus clear that 
stereopsis is not always a matter of online processing. It probably makes use of several 
levels of representations. Various types of information (eg positional versus orientational 
information, absolute versus relative disparity information) may decay at different rates, 
thus requiring different updating frequencies to produce a stable three-dimensional percept. 

4.4 Clinical implications 

We have used alternating presentations of stereoscopic stimuli at various frequencies as a 
tool to sort out classes of stereograms according to their difficulties. The tool seems to have a 
much higher resolving power than we anticipated. After this first exploration we hope to use 
it to gain insight into the processing requirements for other classes of stereograms. We also 
hope that this work will induce clinicians to consider that stereoscopic competence does not 
reduce to a single scale of stereoscopic acuity. Stereopsis recruits several circuits dedicated 
to the solution of different types of stereoscopic problems, perhaps involving different brain 
areas. The progress made by patients during orthoptic reeducation should be followed on a 
sample of stereograms that require qualitatively different kinds of stereoscopic interpretation 
tasks. 
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